ContrApption

ContrApption allows users to easily create a boxplot that responds to user input via a dropdown menu without using JavaScript and with minimal R usage. It requires 3 inputs:

Note that the row names of the annotation data must mach the column names of the numeric data. This is reflect the requirements of the RNA-Seq experiments for which ContrApption was designed. An intuitive visualization of this data structure can be found on page 5 of the Beginner’s guide to using the DESeq2 package. There is no reason a users from other domains with data in the same structure cannot use the tool however, as the inputs required are simply strings and numbers.

Dependencies

We load the requires libraries:

devtools::install()
## Skipping 2 packages not available: pasilla, DESeq2
##   
   checking for file ‘/Users/Thadryan/Dropbox (Partners HealthCare)/Workspace/RPackages/ContrApption/DESCRIPTION’ ...
  
✓  checking for file ‘/Users/Thadryan/Dropbox (Partners HealthCare)/Workspace/RPackages/ContrApption/DESCRIPTION’
## 
  
─  preparing ‘ContrApption’: (994ms)
## 
  
   checking DESCRIPTION meta-information ...
  
✓  checking DESCRIPTION meta-information
## 
  
─  checking for LF line-endings in source and make files and shell scripts (466ms)
## 
  
─  checking for empty or unneeded directories
## 
  
─  building ‘ContrApption_0.0.2.9000.tar.gz’
## 
  
   
## 
Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \
##   /var/folders/g0/2sgkg3j573v5c8tt_dqgkjmw0000gp/T//Rtmpa1xBSd/ContrApption_0.0.2.9000.tar.gz \
##   --install-tests 
## * installing to library ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library’
## * installing *source* package ‘ContrApption’ ...
## ** using staged installation
## ** R
## ** inst
## ** byte-compile and prepare package for lazy loading
## ** help
## *** installing help indices
## ** building package indices
## ** installing vignettes
## ** testing if installed package can be loaded from temporary location
## ** testing if installed package can be loaded from final location
## ** testing if installed package keeps a record of temporary installation path
## * DONE (ContrApption)
library(ContrApption)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# libraries for optional integration
library(DT)
library(crosstalk)

suppressMessages(library(DESeq2))
suppressMessages(library(pasilla))

Dataset

This vignette makes use of DESeq2 and the pasilla dataset to demonstrate the intended use of ContrApption. The following is a modified version of the “Count matrix input” section of the DESeq2 vignette

Annotation Data and Groups

The following snippet retrieves the column metadata for the Pasilla experiment.

# loads sample annotation from the pasilla package
pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv",
                       package = "pasilla", mustWork = TRUE)

# read in the sample data
coldata <- read.csv(pasAnno, row.names = 1)

# select relevant columns
coldata <- data.frame(coldata[ , c("condition","type")])

# remove un-needed characters
rownames(coldata) <- gsub("fb", "", rownames(coldata))

coldata
##            condition        type
## treated1     treated single-read
## treated2     treated  paired-end
## treated3     treated  paired-end
## untreated1 untreated single-read
## untreated2 untreated single-read
## untreated3 untreated  paired-end
## untreated4 untreated  paired-end

This matrix will be our annotation input. Here the rownames are the sample names of the experiment and either ‘condition’ or ‘type’ may be used as a group for groupCol.

Numeric Data

The Pasilla dataset contains per-exon and per-gene read counts of RNA-seq samples which will be our data input. The exons/genes are the rows, and the samples in the experiment are the columns. The following code extracts the counts from the package.

# loads counts file from the pasilla package 
pasCts <- system.file("extdata", "pasilla_gene_counts.tsv",
                      package = "pasilla", mustWork = TRUE)

# reads counts of genes
cts <- read.csv(pasCts, sep = "\t", row.names = "gene_id")

# make sure the order is correct
cts <- cts[, rownames(coldata)]

head(cts)
##             treated1 treated2 treated3 untreated1 untreated2 untreated3
## FBgn0000003        0        0        1          0          0          0
## FBgn0000008      140       88       70         92        161         76
## FBgn0000014        4        0        0          5          1          0
## FBgn0000015        1        0        0          0          2          1
## FBgn0000017     6205     3072     3334       4664       8714       3564
## FBgn0000018      722      299      308        583        761        245
##             untreated4
## FBgn0000003          0
## FBgn0000008         70
## FBgn0000014          0
## FBgn0000015          2
## FBgn0000017       3150
## FBgn0000018        310

DESeq2

We can create a DESeq2 dataset and then extract the normalized counts for visualization.

# create a DESeq2 dataset from the metadata and counts
dds <- DESeq(DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition))
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
normCounts <- counts(dds, normalized = TRUE)

Given that embedding multiples widgets with 14599 rows of data may lead to performance issues, the user may want to focus one genes of interest as determined by the differential expression experiment:

res <- results(dds, tidy = TRUE) %>% 
  data.frame %>% 
  filter(padj < 0.05)

normCountsSig <- normCounts %>%
  data.frame %>% 
  filter(rownames(.) %in% res$row)
  
normCountsSig %>% nrow
## [1] 845

Examples

Two-factor Examples

We pass the dataset and the annotation file to ContrApption, resulting in a widget with a dropdown menu of features and boxplots of the input data grouped as directed.

ContrApption(data = normCountsSig, annotation = coldata)

Users may create multiple widgets per book. In this experiment there is more than one experimental factor, so we can create another widget based on a different group. We may also use the optional y-axis label anf plot title arguments argument.

ContrApption(
  data = normCountsSig,
  annotation = coldata,
  plotName = "DESeq2 normalized counts by sample types",
  yAxisName = "counts"
)

Interaction-style example

ContrApption operates on experimental factors with an arbitrary numbers of levels. We can use this to subset by multiple factors of interest but merging the columns of the annotation file.

# derive a new variable - every combination of condition and type
coldata$interaction <- paste0(coldata$condition, ":", coldata$type)

head(coldata)
##            condition        type           interaction
## treated1     treated single-read   treated:single-read
## treated2     treated  paired-end    treated:paired-end
## treated3     treated  paired-end    treated:paired-end
## untreated1 untreated single-read untreated:single-read
## untreated2 untreated single-read untreated:single-read
## untreated3 untreated  paired-end  untreated:paired-end

We can then use the function normally:

# pass our new variable
ContrApption(data = normCountsSig, annotation = coldata)
Note that users interested in this sort of experiment would generally re-run DESeq2 with a different experimental design: 
>DESeq(DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition * type))

Crosstalk Integration

ContrApption is crosstalk compatible for usage with datatable. Users may leverage this by creating a shared data object and passing it to the datatables and ContrApption function. Selecting a row in the datatable will result in a call to visualize the selected row in ContrApption.

normCountsSigShared <- SharedData$new(normCountsSig)

The user may also pass the two widgets to the bscols function and specifying the bscolSize argument to ContrApption so it will resize appropriately:

x <- ContrApption(data = normCountsSigShared, annotation = coldata)
# creates a side-by-side pair of widgets
bscols(
  widths = c(9, 3),
  # first column
  datatable(
    normCountsSigShared,
    extensions = "Scroller",
    style = "bootstrap",
    class = "compact",
    width = "100%",
    selection = 'single',
    options = list(
        deferRender = TRUE,
        scrollY = 300,
        scroller = TRUE
      )
    ),
  # second column
  x
)
# devtools::install(); library(ContrApption); ContrApption(data = normCountsSigShared, annotation = coldata
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] pasilla_1.18.1              DESeq2_1.30.1              
##  [3] SummarizedExperiment_1.20.0 Biobase_2.50.0             
##  [5] MatrixGenerics_1.2.1        matrixStats_0.58.0         
##  [7] GenomicRanges_1.42.0        GenomeInfoDb_1.26.4        
##  [9] IRanges_2.24.1              S4Vectors_0.28.1           
## [11] BiocGenerics_0.36.0         crosstalk_1.1.1            
## [13] DT_0.17                     dplyr_1.0.5                
## [15] ContrApption_0.0.2.9000    
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6           fs_1.5.0               usethis_2.0.1         
##  [4] devtools_2.3.2         bit64_4.0.5            RColorBrewer_1.1-2    
##  [7] httr_1.4.2             rprojroot_2.0.2        tools_4.0.4           
## [10] utf8_1.2.1             R6_2.5.0               DBI_1.1.1             
## [13] colorspace_2.0-0       withr_2.4.1            tidyselect_1.1.0      
## [16] prettyunits_1.1.1      processx_3.4.5         bit_4.0.4             
## [19] compiler_4.0.4         cli_2.3.1              desc_1.3.0            
## [22] DelayedArray_0.16.2    scales_1.1.1           genefilter_1.72.1     
## [25] callr_3.5.1            stringr_1.4.0          digest_0.6.27         
## [28] rmarkdown_2.7          XVector_0.30.0         pkgconfig_2.0.3       
## [31] htmltools_0.5.1.1      sessioninfo_1.1.1      fastmap_1.1.0         
## [34] htmlwidgets_1.5.3      rlang_0.4.10           RSQLite_2.2.4         
## [37] generics_0.1.0         jsonlite_1.7.2         BiocParallel_1.24.1   
## [40] RCurl_1.98-1.2         magrittr_2.0.1         GenomeInfoDbData_1.2.4
## [43] Matrix_1.3-2           Rcpp_1.0.6             munsell_0.5.0         
## [46] fansi_0.4.2            lifecycle_1.0.0        stringi_1.5.3         
## [49] yaml_2.2.1             zlibbioc_1.36.0        pkgbuild_1.2.0        
## [52] grid_4.0.4             blob_1.2.1             crayon_1.4.1          
## [55] lattice_0.20-41        splines_4.0.4          annotate_1.68.0       
## [58] locfit_1.5-9.4         knitr_1.31             ps_1.6.0              
## [61] pillar_1.5.1           geneplotter_1.68.0     pkgload_1.2.0         
## [64] XML_3.99-0.5           glue_1.4.2             evaluate_0.14         
## [67] remotes_2.2.0          vctrs_0.3.6            testthat_3.0.2        
## [70] gtable_0.3.0           purrr_0.3.4            assertthat_0.2.1      
## [73] cachem_1.0.4           ggplot2_3.3.3          xfun_0.22             
## [76] xtable_1.8-4           survival_3.2-7         tibble_3.1.0          
## [79] AnnotationDbi_1.52.0   memoise_2.0.0          ellipsis_0.3.1